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Abstract 

The (bounded) hairpin completion and its iterated versions are operations on formal lan- 
guages which have been inspired by the hairpin formation in DNA-biochemistry. The paper 
answers two questions asked in the literature about the iterated hairpin completion. 

The first question is whether the class of regular languages is closed under iterated bounded 
hairpin completion. Here we show that this is true by providing a more general result which 
applies to all the classes of languages which are closed under finite union, intersection with 
regular sets, and concatenation with regular sets. In particular, all Chomsky classes and all 
standard complexity classes are closed under iterated bounded hairpin completion. 

In the second part of the paper we address the question whether the iterated hairpin 
completion of a singleton is always regular. In contrast to the first question, this one has 
a negative answer. We exhibit an example of a singleton language whose iterated hairpin 
completion is not regular, actually it is not context-free, but context-sensitive. 

Keywords: Formal languages, Finite automata, Hairpin completion, Bounded hairpin completion 

1 Introduction 

The hairpin completion is an operation on formal languages which is inspired by DNA-computing 
and biochemistry where it appears naturally in chemical reactions. It turned out that the corre- 
sponding operation on formal languages gives rise to very interesting and quite subtle decidability 
and computational problems. The focus of this paper is on these formal language theoretical 
results. However, let us sketch the biochemical origin of this operation first. 

A DNA strand is a polymer composed of nucleotides which differ from each other by their 
bases A (adenine), C (cytosine), G (guanine), and T (thymine). For our purposes a strand can 
be seen as a finite sequence of bases. By Watson-Crick base pairing two base sequences can bind 
to each other if they are pairwise complementary, where A is complementary to T and C to G. 
The hairpin completion is best explained by Figure 1. By a sequence w we always mean to read 
w from right to left and to complement base by base, i.e., a\ ■ ■ ■ a n = Thl ■ ■ ■ a\. During a chemical 
process, called annealing, a strand which contains a sequence a and ends on the complementary 
sequence a, Fig. 1(a), can form an intramolecular base-pairing which is known as hairpin (in case 
a is not too short, say \a\ > 10), see Fig. 1(b). By complementing the unbound sequence 7, the 
hairpin completion arises, Fig. 1(c). 

Hairpin completions of strands develop naturally during a technique called Polymerase Chain 
Reaction (PCR). The PCR is often used in DNA algorithms to amplify DNA strands with certain 
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(a) strand 



(b) hairpin 



(c) hairpin completion 



Figure 1: Hairpin completion of a DNA-strand. 

properties. In many algorithms which use PCR the hairpin completions are by-products which 
cannot be used for the subsequent computation. Therefore, sets of strands which are unlikely 
to build hairpins (or lead to other bad hybridizations) have been examined in many papers, see 
e.g., [2,4,5,10,11]. 

On the other hand, some DNA-based computations rely on the fact that DNA strands can form 
hairpins. An example are algorithms using the Whiplash PCR in which strands are designed to 
build hairpins. This technique can be used to solve combinatorial problems, including NP-complete 
ones like Satisfiability and Hamiltonian Path, see [6,18,19]. 

On an abstract level a strand can be seen as a word and a (possibly infinite) set of strands is 
a language. The hairpin completion of formal languages has been introduced in [1] by Cheptea, 
Martin- Vide, and Mitrana. In several papers the hairpin completion and some familiar operations 
have been studied, see e.g., [1,3,14-17]. The focus of this paper is on closure properties of 
language classes concerning the iterated versions of the hairpin completion and the bounded hairpin 
completion. For the latter operation we assume the length of the 7-part to be bounded. This 
variant of the hairpin completion was introduced and analyzed in [8, 9] by Ito, Leupold, Manea, 
and Mitrana. A formal definition of both operations is given in Section 2.1. 

In [1] the closure properties of different language classes under the non-iterated and iterated 
hairpin completion have been analyzed. It follows that neither regular nor context-free languages 
are closed under hairpin completion whereas the family of context-sensitive languages is closed 
under this function. Actually, from [1] we can derive that the class DSPACE(/) (resp. the class 
NSPACE(/)) is closed under hairpin completion (resp. closed under iterated hairpin completion) 
for every function / G O(log). (By the class DSPACE(/) (resp. NSPACE(/)) we mean, as usual, 
the class of languages that can be accepted by a deterministic (resp. non-deterministic) Turing 
machine which uses f(n) work space on input length n.) In particular, the class of context-sensitive 
languages is closed under iterated hairpin completion, too. Furthermore, if we apply the iterated 
hairpin completion to a regular (resp. context-free) language we stay inside NL(= NSPACE(log)) 
(resp. NSPACE(log 2 ), by Lewis, Stearns, and Hartmanis [13]) which is in terms of space complexity 
far below the class of deterministic context-sensitive languages. 

The situation changes if we consider the bounded hairpin completion, which can be seen as a 
weaker variant of the hairpin completion. All classes in the Chomsky Hierarchy are closed under 
bounded hairpin completion and the classes of context-free, context-sensitive, and recursively 
enumerable languages are closed under the iterated operation, see [8,9]. But the status for regular 
languages remained unknown and was stated as an open problem in [9]. In Section 3 we solve 
this problem. We state a general representation for the iterated bounded hairpin completion of 
a formal language using the operations union, intersection with regular sets, and concatenation 
with regular sets (Theorem 3.1). As a consequence all language classes which are closed under 
these basic operations are also closed under iterated bounded hairpin completion. 

Furthermore, for a given non-deterministic finite automaton (NFA) accepting a language L, we 
give exponential lower and upper bounds for the size of an NFA accepting the iterated bounded 
hairpin completion of L in Theorem 4.1. Thus, if we ignore constants, the NFA leads us to a linear 
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time membership test for the iterated bounded hairpin completion of a fixed regular language. This 
improves a quadratic bound which was known before. Indeed, the best known time complexity of 
the membership problem for the iterated (unbounded) hairpin completion of a regular language 
L is still quadric time by an algorithm from [14]. See Section 4 for a more detailed discussion. 

The class of iterated hairpin completions of singletons (HCS) has been investigated in [17] 
by Manca, Mitrana, and Yokomori (which is the journal version of a paper that appeared at 
AFL 2008). Obviously, HCS is included in the class of context-sensitive languages. However, 
the questions if HCS contains non-regular or non-context-free languages has been unsolved. In 
Section 5 we answer this question by stating a singleton whose iterated hairpin completion is not 
context-free. 

This paper is the journal version of results which appeared as a poster at DLT 2010, [12]. 

2 Definitions and Notation 

We assume the reader to be familiar with the fundamental concepts of formal language and 
automata theory, see [7]. 

An alphabet is a finite set of letters. In this paper the alphabet is always E. The set of words 
over E is denoted by E*, as usual, and the empty word is denoted by e. We consider E with an 
involution; this is a bijection : E — > E such that 5 = a for all letters a s E (in DNA-biochemistry: 
E = {A, C, G, T} with A — T and C — G). We extend the involution to words w = a\ ■ ■ ■ a n by 
55 = tin • • • ol- (Just like taking inverses in groups.) For a formal language L by L we denote the 
language {w | w £ L}. 

Given a word w, we denote by \w\ its length. For a length bound I > the set E- f contains 
all words of length at most £. If w — xyz for some X, y,z € E*, then x, y, and z are called prefix, 
factor, and suffix of the word w, respectively. For the prefix relation we also use the notation 
x < w. Note that if z is a suffix of w, then z is a prefix of w (or z < w). 

A common way to describe regular languages are non- deterministic finite automata (NFAs). 
An NFA A is a tuple (Q, E, E, I, F) where Q is the finite set of states, I C Q is the set of initial 
states, F C Q is the set of final states, and £CQxExQis the set of labelled edges or transitions. 
The language accepted by the automaton, denoted by L(A), contains all words w such that there 
is a path labelled by w which leads from an initial state to a final state. By the size of an NFA 
we mean the number of states \Q\. 

2.1 The Hairpin Completion 

Let w £ E* be a word. If w has a factorization w = ^afia, it can form a hairpin and jafiaj is a 
right hairpin completion of w (again, see Figure 1). Since a hairpin in biochemistry is stable only 
if a is long enough, we fix a constant k > 1 and ask \a\ = k. (Note that the definition does not 
change if we ask |a| > k.) 

Symmetrically, if w has a factorization aftcry with |ct| = k, then ^afiorf is a left hairpin 
completion of w. For the bounded hairpin completion we assume that the length of the factor 7 
is bounded by some constant. 

The hairpin completion of a formal language L is the union of all hairpin completions of 
all words in L. Before we state the formal definition of the unbounded and bounded hairpin 
completion of a language, we introduce a more general variant of the hairpin completion, namely 
the parameterized hairpin completion. The parameterized hairpin completion covers the other 
operations as special cases. 
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Let f,reHU {00} be two length bounds and let L be a formal language. Considering a left 
hairpin completion with the factorization 701/357 as above, then the bound £ limits the length 
of 7; respectively, the bound r limits the length of 7 in a right hairpin completion. For a word 
a£E' the parameterized hairpin completion is defined as 

H a (L,l,0)= (J 7 (aS*57nL) 

Ha(L,0,r)= (J (7aS*anL)7 

7e£<>- 

H a (L,£,r) = H a (L,f,0)UH a (I,0,r). 

For the constant fc we define 

H k (L,£,r) = \J H a {L,£,r). 

In the unbounded case we distinguish two operations: The (two-sided) hairpin completion is 
defined as T-L k {L) = fi^L, 00,00) and the right-sided hairpin completion is defined as lZH k (L) = 
%k(L, 0, 00). For the latter case we allow right hairpin completions, only. In the same way we 
might define the left-sided hairpin completion of a language, but for convenience we will treat the 
right-sided operation, only, and also refer to it as the one-sided hairpin completion. It is plain, 
that our results also hold for the left-sided case. 

The bounded hairpin completion H(L,m,m) arises if we choose the same finite bound m G N 
for left and right hairpin completions. 

Note that if both bounds £, r are finite and L is regular, then the parameterized hairpin 
completion l-l k (L,£,r) is regular as well. This does not hold if £ = 00 or r — 00 as one of the 
unions becomes infinite. It is known that the unbounded hairpin completion of a regular language 
is not necessarily regular but always linear context-free, see e.g., [1]. 

In this paper we examine the iterated versions of the operations we defined so far. The iterated 
hairpin completion of a language L contains all words which belong to a sequence wq, . . . , w n where 
u>o G L and where W4 is a right or left hairpin completion of w%—\ and the bound r (resp. £) applies 
for all i such that 1 < i < n. More formal, let £, r G N U {00} and 

H° a (LJ,r) = L, U l a {L,£,r) = Uj^\LAr)M, 

H° k (L, £, r) = L, Hl(L, £, r) = H k {W^{L, £, r),£, r) 

for i > 1. The iterated parameterized hairpin completion of L is the union 

H* a (L,i,r)=\jHt t (L,i,r) resp. H* k (L, £, r) = |J Hl(L, I, r). 

i>Q i>0 

If a word z is included in H l k ({w} ,£,r), we say z is an z-iterated hairpin completion of w, and 
if Z £ Hk({w} ,£,r), we say z is an iterated hairpin completion of w. (It will be clear from the 
context which length bounds apply.) 

The iterated unbounded hairpin completions are denoted by T-L* k {L) = ?^(L,oo,oo) and 
VK* k {L)=n* k (L,0 1 oo). 

Example. Figure 2 shows a 3-iterated hairpin completion of auava where \a\ = k. In each step 
the dotted part is the newly created prefix or suffix. 
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a u a 
— I \—y 




a u a v a 




a u a v a 




uavau 



a u a 



a u a v a 



a u a v a 



Figure 2: Example for the iterated hairpin completion. 



3 The Iterated Bounded Hairpin Completion 

In this section we will give a general representation for the iterated parameterized hairpin com- 
pletion with finite bounds. Our main result is the proof of the following theorem which can be 
found in Section 3.2. 

Theorem 3.1. Let L be a formal language and £,r £ N. The iterated parameterized hairpin 
completion %j!(.L,l, r) can be effectively represented by an expression using L and the operations 
union, intersection with regular sets, and concatenation with regular sets. 

Consequentially, all language classes which are closed under these operations are also closed 
under iterated parameterized hairpin completion with finite bounds, and if the closure under all 
three operations is effective, then the closure under iterated parameterized hairpin completion 
with finite bounds is effective, too; this applies to all four Chomsky classes. From [9] it is known 
that the classes of context-free, context-sensitive, and recursively enumerable languages are closed 
under iterated bounded hairpin completion, but the status for regular languages was unknown. 
Since the iterated bounded hairpin completion is a special case of the iterated parameterized 
hairpin completion with finite bounds we can answer this question now. 

Corollary 3.2. LetC be a class of languages. If C is closed under union, intersection with regular 
sets, and concatenation with regular sets, then C is also closed under iterated bounded hairpin 
completion. Moreover, if C is effectively closed under union, intersection with regular sets, and 
concatenation with regular sets, then the closure under iterated bounded hairpin completion is 
effective. 

In particular, the class of regular languages is effectively closed under iterated bounded hairpin 
completion. 

The next two sections are devoted to the proof of Theorem 3.1. First we introduce the impor- 
tant concept of a-prefixes. 

3.1 a-Prefixes 

Let a be a word of length k. For v, w £ E* we say v is an a-prefix of w if va < w. We denote the 
set of all a-prefixes of length at most £ by 



The idea behind this notation is: For a word w £ aE'a with \w\ — k > £,r, the set of 
(non-iterated) parameterized hairpin completions of w is given by 



In the following proof we are interested in a-prefixes of words which have a as a prefix. This 
leads to some useful properties. 

Lemma 3.3. Let a £ E fe . feN, and w £ a£*. 



V a (w,£) = {v\ va < w A \v\ < £} . 



n a ({w},£,0) = V a {w,£)w 



and 



H a ({w},0,r) 



wV a (w, r). 
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1. For all v £ r P a (w,i) we have a < va. 

2. For all u,v £ V a (w,£) we have 

\u\ < \v\ 4=> ua < va u G V a (va,£). 

3. If va is a prefix of some word in V a (w,£)*a, then v £ V a (w, £)* . 

Proof. If two words x, y are prefixes of w and \x\ < \y\, then x < y. This yields properties 1 and 
2. 

For property 3 let va < Xi ■ ■ ■ x m a where X\,...,x m £ V a (w,£). We can factorize v = 
X\ ■ ■ ■ Xi_iy such that y < Xi for some ? with 1 < i < m. By property 1 and induction, we see that 
a is a prefix of ■ ■ ■ x m a and hence ya < xia < w which implies y £ V a (w,£) and, moreover, 
v eV a (w,£)*. □ 

3.2 Proof of Theorem 3.1 

Let L be a formal language and ^, r £ N. We will state a representation for Hl(L,£,r) using L 
and the operations union, intersection with regular sets, and concatenation with regular sets. 

Let us begin with a basic observation. Every word w which is a hairpin completion of some 
other word has a factorization w — 5/3S with \6\ > k, therefore, the prefix of w of length k and the 
suffix of w of length k are complementary. Let us call this prefix a, hence, we have w £ aT,*a. 
Every word which is a right hairpin completion of w has still the prefix a and since the suffix 
of length k is complementary, it has the suffix a as well. For left hairpin completions we have a 
symmetric argument and, by induction, every word which is an iterated hairpin completion of w 
has prefix a and suffix a. 

Thus, we can split up the (non-iterated) parameterized hairpin completion TLk(L,£,r) into 
finitely many languages L a = Hk(L,£,r) D aE'a where a £ S fe , and each of them has a effective 
representation using L and the operations union, intersection with regular sets, and concatenation 
with regular sets. Moreover, 

Hl(L a: £,r) = H* a (L ai £,r) C aY>*a 
and the iterated parameterized hairpin completion equals 

U* k (L,£,r) = LuH* k (H k (L,e,r),e,r) 
= LUH* k ( (J L a ,£,r) 

= LU |J H*(L a ,t,r). 

Henceforth, let a £ E fe be fixed. In order to prove Theorem 3.1 we will state a suitable 
representation for "H* (L a ,£,r). For the rest of the proof we will heavily rely on the fact that every 
word in H* a (L a ,£,r) has the prefix a and the suffix a. The representation is defined recursively. 
We have 

H* a (L a ,0,0) = L a . 

By symmetry, we may assume that £ > r and I > 1. We will state a representation for 
T-L* a {L a ,£,r) using %*(L Q ,£ — l,r) and the operations union, intersection with regular sets, and 
concatenation with regular sets. Therefore, consider a word 

zeH* a {L a ,£, r)\H* a (L a ,£-l,r). 
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For some n > 1 there is a sequence wq , . . . , w n = z where wq G L a and for all i such that 
1 < i < n either Wi is a left hairpin completion of Wi-i and \wi\ < + £ or Wi is a right 

hairpin completion of and \wi\ < + r. Furthermore, there is an index j > 1 such that 

tOj-i = to G H* a {L a ,£ — l,r) and = vw ^ 'H* a {L a ,£ — l,r). Note that this implies |w| = £ and 
w G Let s — n — j and consider the factorization 

z = x s -- -xivwyl- ■ -y~s 

where Xi ■ ■ • xivwyl • ■ - yl — Wj+i and either 

1. yi = e, \xi\ < £, and x t a < y^i ■ ■ ■ yiva or 

2. Xi = e, \y t \ < r, and yia < x^i ■ ■ ■ xiva. 

for all i such that < i < s. 

The crucial point is that vw has the prefix va, the suffix av, and \v\ = £ > r. Therefore, the 
factors xi, . . . ,x s and y\, . . . , y s are controlled by the triple (v, £, r) in the following way. 

Lemma 3.4. Xi G V a (va,i)* and yi G V a (va,r)* for all i such that l<i<s. 

Proof. We prove the claim by induction on i. Let i such that 1 < i < s. Our induction hypothesis 
is Xj G V a (va 7 £)* and j/j G V a {va,r)* for all j such that 1 < j < i. We distinguish between the 
two cases above: 

1. We have yi — e G V a {va, r)* and, by induction hypothesis, 

XiCt < yi-i ■ ■ ■ yiva G V a (va, r)*va C V a (va, £)*a. 
In combination with Lemma 3.3 this yields Xi G V a (va,i)* . 

2. We have Xi = e G V a (yu,i)* and 

yia < Xi-i ■ ■ ■ x\va G V a (va, £)*a, 

hence yi G V a (va, £)* . Since \y t \ < r, all factors of yi are at most of length r, too, and 
y l £P a (va,r)*. □ 

For u G £ let us define the language 

C a (u, £, r) = V a (ua, £)*u {%* a {L a ,£ — 1, r) fl P a (na, r) . 

Note that, by induction, for every u the representation for C a (u,£,r) is effectively given. By 
Lemma 3.4, the word z is included in £ a (v, £, r) and for every word z' G T-L* a {L ai £, r) \ H* a {L a , £ — 
l,r) it exists v' G Tr such that z 1 G £ a (v' , £,r). Therefore, 

M* a (L a ,l,r)CH* a (L a ,l-l,r)U \J C a (u,£,r) 

and for the right hand side we have an effective representation. Of course, we intend to replace 
the inclusion by an equality sign. 

Lemma 3.5. C a (u,£,r) C W a (L a ,£,r) for all u G S £ . 
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Proof. We start by proving a special case of the claim that is successfully used later to derive the 
result. Consider a word w' together with the factorization 

w' = x m - ■ -xxwyi ■ • -2M 

with m > 0, n > 1 and where for some word iieE* 

1. w £ H* a {L a ,£, r) n uaY,*au, 

2. xi,...,x m e V a (ua,£), 

3. yi,...,y n eV a (ua,r), and 

4. m = or < |x m | for all j such that 1 < j < n. 

We claim w' £ H* a (L a ,£, r), too. Indeed, if m = 0, it is plain that w' is an n-iterated right hairpin 
completion of w. Otherwise x m • ■ -x\w is an m-iteratcd left hairpin completion of w. By the 
fourth property and Lemma 3.3, we have y%,...,y n G V a (x m a,r). Hence, w' is an n-iterated 
right hairpin completion of x m ■ ■ ■ x\w and we conclude w' £ W a (L a , £, r). 

Now, let u £ T, e and z £ C a (u,£,r). There is a factorization 

z = x s ■ ■ ■ xxwyj ■ ■ ■ y~ t 

where 

1. w£ u(H* a (L,£-l,r)naY<*au) C H* a (L a ,£,r) r\uaY<*au, 

2. xi, . . . , x s G V a (ua, £), and 

3. yi,...,y t G V a (ua,r). 

If i = 0, the word z is an s-iterated left hairpin completion of w. Otherwise, let n > 1 be the 
maximal index such that \y n \ > for all 1 < j < i, and let m be the maximal index such that 
\yn\ < \x m \ or if no such index exists. Let w' = x m ■ ■ ■ x\wyl ■ ■ ■ yH. Note that w' satisfies the 
conditions of the special case we discussed above and hence w' £ 1i* a {L a ,£,r). 
With u' = y n we obtain 

z = x s --- x m+ iw'y^l ■■■yl 
where, by the choice of n, m and by Lemma 3.3, 

1. w' G Ha(L a ,£,r) n u'aT,*av7, 

2. x m+ i, . . . , x s £ V a (u'a, £), and 

3. y n +i, ■••,?/* G V a (u'a,r). 

At this point we may continue inductively and deduce z £ T-L^,(L a ,£,r). □ 

The previous lemma tells us, if £ > r, the iterated parameterized hairpin completion of L a can 
be represented by 

H* a {L a ,£,r) =H* a (L a ,£-l,r)U \J C a (u,£,r). 

Symmetrically, if r > £, let us define 

lZ a (u, £, r) — V a {ua, £)* (H* a {L a ,£, r — 1) fl uaT,*a) uP a (ua, r) . 
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The iterated parameterized hairpin completion of L a can be represented by 
U* a (L a ,£,r) =UUL a ,t,r-l)U (J K a (u,£,r). 

u£S r 

We conclude, the iterated parameterized hairpin completion of a language L can be repre- 
sented by an expression using L and the operations union, intersection with regular sets, and 
concatenation with regular sets. 

4 The size of NFAs accepting iterated parameterized hair- 
pin completions 

Let L be a regular language and £, r £ N be finite bounds. In this section we analyze the size of 
NFAs accepting the iterated parameterized hairpin completion l~L* k (L 1 l 1 r) with respect to the size 
of an NFA accepting L and the bounds I and r. By the size of an NFA we mean its number of 
states. Recall that k is treated as a constant. (Assuming k < £ or k < r would induce the same 
complexity, but this is not shown here.) Our results are the following. 

Theorem 4.1. 

1. Let m > 1. There is a regular language L such that neither the language l-Lk{L^m^m) nor 
the language T-L k (L,m,m) can be detected by an NFA with less than 2 m states. 

2. Let L be a regular language which is accepted by an NFA of size n. Let £,r £ N and let 
m = max{l,r}. There is an NFA accepting the iterated parameterized hairpin completion 
TL* k {L,£,r) whose size is in 2°( m ^n. 

Proof of 1. Let £ = {a, a, b, b, c, c} and L = c{a,b}*a k a k . For any word w G L there is no 
possibility of building a left hairpin and the only possible right hairpin is to bind the suffix a k to 
a k if \w\ < m + 2k. Therefore, we have 

T-Lk(L,m,m) — cva k a k vc. 

Now let w = cva k a k vc with v € {a, b}-" 1 ^ 1 . The only way to build a hairpin is to bind its 
prefix to its suffix, hence 

Ti* k (L^ m, m) — LU l-tk{L, to, to). 

We claim that an NFA accepting Hk(L,m,m) or Hl(L,m,m) has a size of at least 2 m . We 
prove the claim for the language Hk(L,m,m); the argumentation for TL k (L,m,m) is exactly the 
same. 

Consider an NFA accepting TLk(L, m, m) and let Q denote its set of states. For a word u £ S* we 
denote by P(u) C Q the set of states which are reachable from an initial state with a path labelled 
by u. Now let v £ {a, b}-" 1 ^ 1 . Since cva k a k vc £ TLk{L, m, m), there is a state q £ P(cva k a k ) such 
that a path from q to a final state exists which is labelled by vc. For all words u £ {a, 6}-" i_1 
with u 7^ v the state q does not belong to P(cua k a k ) because cua k a k vc Hk{L,m } m). Each 
word v £ {a, 6}- m_1 yields such a state q, they are mutually different, and none of them is an 
initial state (as vc ^ Hk(L,m,m)). Therefore, the number of states \Q\ has to be greater than 
ka,^" 1 - 1 ! = 2 m - 1. □ 
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In order to prove the second claim of Theorem 4.1 we implicitly use some well-known construc- 
tions of NFAs which accept concatenation, union, or intersection of regular languages. Consider 
two NFAs which accept the languages L\, L 2 and which are of size ni, U2, respectively. There is 
an NFA accepting the concatenation L\L 2 which is of size n\ + n 2 , an NFA accepting the union 
Li U L 2 which is of size n\ + n 2 , and an NFA accepting the intersection Li n L 2 which is of size 
7ii ■ n 2 . For details on how these NFAs are constructed see, e.g., [7]. 

Proof of 2. Let L be a regular language which is accepted by an automaton of size n and let 
£,r G N. The parameterized hairpin completion of L is given by 

H k (L,e,r)= \J |J 7 (a£*57ni)U |J |J ( 7 a£*5ni)7. 

For 7, a G S* there is an NFA accepting j(aT,*a^ n L) which has a size in C(|7a| • n). Hence, the 
parameterized hairpin completion of L can be accepted by an NFA which has a size in C(|E| m m ■ 
n) C 2°( m 'n where m = max {£, r}. 

For a G E fe the language L a = Hk{L, £, r) C a£*5 can also be accepted by an NFA which has 
a size in 2°^ m ^n. Let Nij denote the minimal size of an NFA accepting W a {L a ,i, j) for i,j G N. 
Since %k{L a , 0, 0) = L a , we have A^o G 2°^ m ^n. For i > j let us recall that 

K* a {L a ,i,j) = U* a (L a ,i~l,j)\J (J C a (u,i,j), 

£q,(u, — V a (ua,£)*u (7i* a (L ai i — PI aS*a?7) V a (ua, r) . 
The size of a minimal NFA accepting C a (u,i,j) is in C(i • A^_ij-) whence 

JVij G 0(|Ef » • Ni-x d ) C 2 '''JVi_ij. 
Symmetrically, for j > i we have Ajj G 2°^- ) iVj By unfolding the recursion we obtain 

^? r m 

Nt, r E 1] 2°« • 2°« • 2°( m >n = [] 2°« ■ n = 2°& 4 >n - 2 (™ 2 )n. 

Now, the iterated parameterized hairpin completion is given by 

H* k (L,£,r) = LU (J H* a (L a ,£,r). 

and there is an NFA accepting Hl(L,i,r) which has a size in 0(Ni^ r + n) C 2 ( m2 )n. □ 

Statement 2 of Theorem 4.1 also yields an algorithm to solve the membership problem for the 
iterated bounded hairpin completion of a regular language. 

Corollary 4.2. Let L be a regular language, given by an NFA of size n, and let £, r G N. The 
problem whether an input word w belongs to W k {L, £, r) can be decided in linear time c ■ \w\, where 
the constant c depends on the size n and the bounds £, r. More precisely, for m — max{f,r} we 
have c G 2° im ^n 2 . 

Proof. Following the proof of Statement 2 of Theorem 4.1, we can construct an NFA A = 
(Q, S, E, I, F) accepting the iterated hairpin completion H%(L, £, r) which is of a size in 2°( m ^n. 
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Let us denote the size of this NFA by N. Note that the construction can be preformed in time 
0{\E\) C 0{N 2 ) C 2°( m2 )n 2 . 

The input w can be accepted by an online power-set construction of the NFA A: We start with 
the set of states Pq = I. When we read the i-th letter o of the input w we construct the set Pi by 
following all outgoing edges of states in Pj_i which are labelled by a. As every state in Pj_i has 
at most N outgoing edges labelled by a, one step can be performed in 0{N 2 ) C 2 0{ - m ^n 2 time. 
The algorithm stops after w is read and P\ w \ is computed. The input w belongs to W^(L,£,r) if 
and only if P\ w \ contains a final state from F. □ 

So far, the best known time complexity of the membership problem for the iterated (un- 
bounded) hairpin completion of a regular language L is quadratic with respect to the length of 
the input word, by an algorithm from [14]. This algorithm can easily be adapted to solve the 
membership problem for the iterated bounded hairpin completion in quadratic time. Hence, if 
we measure the time complexity with respect to the length of the input word only, we have an 
improvement from quadratic to linear time (in the bounded case). 

5 The Iterated Hairpin Completion of Singletons 

The class of iterated hairpin completions of singletons is defined as 

HCS fc = {H* k ({w}) weS*}. 

We solve the problem whether HCSfe includes non-regular or non-context-free languages which 
was asked in [17]. Furthermore, we will show that the result also holds if we consider the iterated 
one-sided hairpin completion. 

Let us recall that, as we are treating the unbounded hairpin completion now, for the usual 
factorization jaf3aj of a hairpin completion, the length of the factor 7 is not bounded by a 
constant anymore. By the results of the previous section it is obvious, that the possibility of 
creating arbitrary long prefixes and suffixes plays an essential role in following proof. 

Theorem 5.1. The iterated one- and two-sided hairpin completions of a singleton are in NL but 
not context-free, in general. 

Proof. The membership to NL follows by the fact that NL is closed under iterated bounded hairpin 
completion, which has been proved in [1]. For convenience, we give a sketch of the proof, here. 

Consider a language L e NL. The iterated hairpin completion Hl(L) can be accepted by a 
non-deterministic Turing machine that works as follows. We use two pointers i and j which mark 
the beginning and the end of a factor of the input w, respectively. By w(i,j) we denote the factor 
beginning at position i and ending at position j. 

1 . We start with i = 1 and j — \w\. 

2. Non-deterministically either continue with step 3 or skip to step 5. 

3. Either guess i' such that i < i' < j and verify that w(i,j) is a left hairpin completion of 
w(i' , j) or guess j' such that i < j' < j and verify that w(i,j) is a right hairpin completion 
of w{i,j'). If the verification is successful, continue with i = i' (resp. j = j'). 

4. Repeat step 2. 

5. Accept if and only if w{i,j) G L. 
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Obviously, this Turing machine accepts T~i\(V). In order to perform step 1-4, we only have to 
store some pointers on the input word; this can be done in log \w\ space. Since L g NL step 5 can 
be performed in log \w\ space, too, and hence H^L) £ NL. 

For the one-sided hairpin completion TZHk(L) we can use almost the same algorithm. The only 
difference is that the pointer i always is 1. 

Now, let E = {a, a, b, b,c,c}, a = a k , and 

w = abaaaca. 

We will prove that R-l({w}) and lZHl({w}) are not context-free. 

Since context-free languages are closed under intersection with regular languages, it suffices 
to show for a regular language R that the intersections R (1 R* k {{w}) and R n TZHl({w}) are not 
context-free. Let u = ba and v = aaba. Note that ua < va < w. Define 

R = wu + vu + Wu + w 

and consider a word z G R: 

z = abaaaca (baf aaba (ab) s acaaaba (ab) 1 amaaba 



with r,s,t > 1. At first, note that w is a prefix of z and it does not occur as another factor in 
z (there is only one c in z). Thus, if z belongs to it must be an iterated right hairpin 

completion of w and hence 

Rnn* k ({w}) = RmzHU{w}). 

Next, we will show that z is an iterated hairpin completion of w if and only if r = s = t. The 
proof is a straight forward construction of z. We try to find a sequence w = Wo, w\, . . . , w n = z 
for some n > where Wi ^ u>i-i is a right hairpin completion of Wi-i for 1 < i < n. This implies 
that every u>i is a prefix of z. 

Fortunately, for each of the words u>o, • • • , w r+ i there is exactly one choice which satisfies these 
conditions: 

= abaaaca 
= abaaacaba 
= abaaaca(ba) 2 



w 


— w 




= wu 


w 2 


= WW 


w r 


= wu 



= abaaaca{ba) r 
w r+ i = wu r v = abaaaca (ba) r aaba 

If s 7^ r, none of the right hairpin completions of w r+ i is a prefix of z (except for w r+ i itself). 
Otherwise, we find exactly one right hairpin completion which satisfies the conditions: 

w r+ 2 = wu r vu r w = abaaaca(ba) r aaba(ab) r acaaaba. 

The argument for the last step is the same. If and only if t = r, we find a prefix of z which is 
a right hairpin completion of w r+ 2 and this is w r +3 = z. 
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We conclude z is an iterated hairpin completion of w if and only if r = s = t and hence 

R n Hk({w}) = {wu r vu r wu r w | r > 1} . 

The intersection i? n TLl({w}) belongs to a family of context-sensitive languages which are well 
known to be non-context-free. From this it follows that TL%({w}) and TU-L%{{w}) are non-context- 
free, too. □ 



6 Final Remarks and Open Problems 

We proved that language classes which have very basic closure properties are closed under iterated 
bounded hairpin completion. With the techniques used in our proof we obtain a better insight 
on the structure of the iterated bounded hairpin completion. This might help to design new 
algorithms which decide the membership of a word to the iterated bounded hairpin completion of 
a given language and also for the unbounded version since for a given word there is an implicit 
given length bound. 

Another interesting problem regarding the hairpin completion is whether the iterated hairpin 
completion of two languages have a common element. Even for two given singletons it is not 
known, if this problem is decidable at all, see [17]. The result of Section 5 proves that this is a 
non-trivial question. However, in the bounded case we can decide this problem for two regular 
languages now. We just need to create the NFAs and test whether the intersection is empty. As 
the size of the NFAs is quite large with respect to the length bounds, this does not seem to be the 
best way to decide the problem. 

We proved the existence of non-context-free languages in the language class HCS. Here, two 
new questions arise naturally: 

1. Does a singleton exist whose iterated hairpin completion is context-free but not regular? 

2. Can we decide for a given singleton whether its iterated hairpin completion is non-regular 
(or non-context-free)? 
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